AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Neural Information Processing SystemsDec-24-2025, 06:51:24 GMT

FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation

fine-grained goal prompting, goal image, name change, (9 more...)

Technology: Information Technology > Artificial Intelligence > Robots (0.59)

Neural Information Processing SystemsAug-18-2025, 02:06:24 GMT

No RL, No Simulation: Learning to Navigate without Navigating

Our approach, No RL, No Simulator (NRNS), is simple and scalable, yet highly effective. NRNS outperforms RL-based formulations by a significant margin.

machine learning, navigation, reinforcement learning, (19 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
(2 more...)

arXiv.org Artificial IntelligenceAug-4-2025

IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

Guo, Wenxuan, Xu, Xiuwei, Yin, Hang, Wang, Ziwei, Feng, Jianjiang, Zhou, Jie, Lu, Jiwen

Visual navigation with an image as goal is a fundamental and challenging problem. Conventional methods either rely on end-to-end RL learning or modular-based policy with topological graph or BEV map as memory, which cannot fully model the geometric relationship between the explored 3D environment and the goal image. In order to efficiently and accurately localize the goal image in 3D space, we build our navigation system upon the renderable 3D gaussian (3DGS) representation. However, due to the computational intensity of 3DGS optimization and the large search space of 6-DoF camera pose, directly leveraging 3DGS for image localization during agent exploration process is prohibitively inefficient. To this end, we propose IGL-Nav, an Incremental 3D Gaussian Localization framework for efficient and 3D-aware image-goal navigation. Specifically, we incrementally update the scene representation as new images arrive with feed-forward monocular prediction. Then we coarsely localize the goal by leveraging the geometric information for discrete space matching, which can be equivalent to efficient 3D convolution. When the agent is close to the goal, we finally solve the fine target pose with optimization via differentiable rendering. The proposed IGL-Nav outperforms existing state-of-the-art methods by a large margin across diverse experimental configurations. It can also handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform using a cellphone to capture goal image at arbitrary pose. Project page: https://gwxuan.github.io/IGL-Nav/.

artificial intelligence, machine learning, navigation, (18 more...)

2508.00823

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.88)
(2 more...)

arXiv.org Artificial IntelligenceJun-10-2025

Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation

Deng, Yijie, Yuan, Shuaihang, Bethala, Geeta Chandra Raju, Tzes, Anthony, Liu, Yu-Shen, Fang, Yi

Instance Image-Goal Navigation (IIN) requires autonomous agents to identify and navigate to a target object or location depicted in a reference image captured from any viewpoint. While recent methods leverage powerful novel view synthesis (NVS) techniques, such as three-dimensional Gaussian splatting (3DGS), they typically rely on randomly sampling multiple viewpoints or trajectories to ensure comprehensive coverage of discriminative visual cues. This approach, however, creates significant redundancy through overlapping image samples and lacks principled view selection, substantially increasing both rendering and comparison overhead. In this paper, we introduce a novel IIN framework with a hierarchical scoring paradigm that estimates optimal viewpoints for target matching. Our approach integrates cross-level semantic scoring, utilizing CLIP-derived relevancy fields to identify regions with high semantic similarity to the target object class, with fine-grained local geometric scoring that performs precise pose estimation within promising regions. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on simulated IIN benchmarks and real-world applicability.

artificial intelligence, natural language, navigation, (16 more...)

2506.07338

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

arXiv.org Artificial IntelligenceMar-13-2025

Image-Goal Navigation Using Refined Feature Guidance and Scene Graph Enhancement

Feng, Zhicheng, Chen, Xieyuanli, Shi, Chenghao, Luo, Lun, Chen, Zhichao, Liu, Yun-Hui, Lu, Huimin

In this paper, we introduce a novel image-goal navigation approach, named RFSG. Our focus lies in leveraging the fine-grained connections between goals, observations, and the environment within limited image data, all the while keeping the navigation architecture simple and lightweight. To this end, we propose the spatial-channel attention mechanism, enabling the network to learn the importance of multi-dimensional features to fuse the goal and observation features. In addition, a selfdistillation mechanism is incorporated to further enhance the feature representation capabilities. Given that the navigation task needs surrounding environmental information for more efficient navigation, we propose an image scene graph to establish feature associations at both the image and object levels, effectively encoding the surrounding scene information. Crossscene performance validation was conducted on the Gibson and HM3D datasets, and the proposed method achieved stateof-the-art results among mainstream methods, with a speed of up to 53.5 frames per second on an RTX3080. This contributes to the realization of end-to-end image-goal navigation in realworld scenarios. The implementation and model of our method have been released at: https://github.com/nubot-nudt/RFSG.

architecture, information, navigation, (15 more...)

2503.10986

Country:

Asia > China > Hunan Province (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Neural Information Processing SystemsOct-10-2024, 14:38:33 GMT

FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation

Learning to navigate to an image-specified goal is an important but challenging task for autonomous systems like household robots. The agent is required to well understand and reason the location of the navigation goal from a picture shot in the goal position. Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations. In this paper, we aim to overcome these limitations by designing a Fine-grained Goal Prompting (\sexyname) method for image-goal navigation. In particular, we leverage fine-grained and high-resolution feature maps in the goal image as prompts to perform conditioned embedding, which preserves detailed information in the goal image and guides the observation encoder to pay attention to goal-relevant regions.

fine-grained goal prompting, goal image, image-goal navigation, (6 more...)

Technology: Information Technology > Artificial Intelligence > Robots (0.61)

arXiv.org Artificial IntelligenceMay-23-2024

Transformers for Image-Goal Navigation

Pelluri, Nikhilanj

Autonomous navigation in environments is a critical capability for modern mobile robots, and has been extensively studied over several decades. Classical approaches to navigation rely on constructing detailed maps of the environment and accurate localization of the robot within the map [1, 2, 3]. However, with increasing demand for deploying robots in novel uncontrolled environments such as households, last-mile delivery, etc., constructing accurate and fine-grained maps frequently is often impractical. Robots must now be able to navigate without maps, which means efficient navigation policies require accurate semantic understanding of the scene, efficient exploration and episodic memory, and long-horizon planning with limited knowledge of the environment. Advances in scene understanding and a have led to semantic navigation tasks such as image-goal navigation [4, 5], object-goal navigation [6, 7], etc. receiving significant focus in recent years. In this work, we consider the specific task of image-goal navigation where robot's navigation objective is specified by an RGB image. We motivate the task with the following scenario: Consider a mobile household robot equipped with an onboard camera tasked with picking up a novel unseen object (say, a new shirt). Since the robot has no prior knowledge about the novel object, it would need other semantic information to understand the object - an image of the object would serve this purpose effectively.

agent, navigation, trajectory, (13 more...)

2405.14128

Country:

North America > United States > Minnesota (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wasserman, Justin, Yadav, Karmesh, Chowdhary, Girish, Gupta, Abhinav, Jain, Unnat

Last-Mile Embodied Visual Navigation

arXiv.org Artificial IntelligenceNov-21-2022

Realistic long-horizon tasks like image-goal navigation involve exploratory and exploitative phases. Assigned with an image of the goal, an embodied agent must explore to discover the goal, i.e., search efficiently using learned priors. Once the goal is discovered, the agent must accurately calibrate the last-mile of navigation to the goal. As with any robust system, switches between exploratory goal discovery and exploitative last-mile navigation enable better recovery from errors. Following these intuitive guide rails, we propose SLING to improve the performance of existing image-goal navigation systems. Entirely complementing prior methods, we focus on last-mile navigation and leverage the underlying geometric structure of the problem with neural descriptors. With simple but effective switches, we can easily connect SLING with heuristic, reinforcement learning, and neural modular policies. On a standardized image-goal navigation benchmark (Hahn et al. 2021), we improve performance across policies, scenes, and episode complexity, raising the state-of-the-art from 45% to 55% success rate. Beyond photorealistic simulation, we conduct real-robot experiments in three physical scenes and find these improvements to transfer well to real environments.

machine learning, navigation, reinforcement learning, (17 more...)

2211.11746

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Television (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)